Distributed Tracing
What You Will Learn
- The tracing data model: traces, spans, context, and baggage
- How to set up OpenTelemetry for Python with Jaeger as the backend
- Auto-instrumentation for FastAPI, SQLAlchemy, Redis, and HTTPX
- How to create custom spans for business logic and pipelines
- How W3C trace context propagates through HTTP, and how to do it manually through Kafka
- Sampling strategies: when and how to sample traces in production
- How to inject trace IDs into log lines to correlate logs and traces
Prerequisites
| Requirement | Details |
|---|---|
| Python 3.11+ | asyncio used throughout |
| FastAPI + SQLAlchemy + Redis | Auto-instrumentation targets |
opentelemetry-sdk and related packages | Full install command below |
| Jaeger | Runs in docker-compose |
| Lessons 01 and 02 complete | Logging and metrics context assumed |
pip install \
opentelemetry-api \
opentelemetry-sdk \
opentelemetry-exporter-otlp-proto-grpc \
opentelemetry-instrumentation-fastapi \
opentelemetry-instrumentation-httpx \
opentelemetry-instrumentation-sqlalchemy \
opentelemetry-instrumentation-redis \
opentelemetry-instrumentation-logging
The Incident: 800ms With No Visible Cause
Three microservices. One user request. Eight hundred milliseconds of total latency.
Individual service logs:
# API Gateway
INFO request received duration_ms=12
# Document Service
INFO document.fetch duration_ms=187
# ML Service
INFO inference.run duration_ms=143
12ms + 187ms + 143ms = 342ms. The user experienced 800ms. Where did the other 458ms go?
Without distributed tracing, this question is unanswerable from logs alone. The gaps between services - serialisation, network transit, queue time, connection setup - are invisible.
With distributed tracing, you open Jaeger and see:
Total: 800ms
├── api-gateway: handle_request [12ms] ████
│ └── document-service: fetch_doc [187ms] ██████████████████████████████████
│ ├── [WAIT: connection pool] [89ms] - this was the actual problem
│ ├── db: SELECT documents [98ms]
│ └── [WAIT: network queue] [112ms] - packets queued at the NIC
│ └── ml-service: run_inference [143ms] ████████████████████████████████
│ └── [WAIT: model warmup] [71ms]
├── [serialize response] [189ms] - JSON serialisation of large doc
└── [network] [57ms]
The 458ms gap is now fully explained: 89ms waiting for a database connection, 112ms of network queue, 189ms of JSON serialisation, and 57ms of network transit. Three specific, actionable fixes.
1. Tracing Concepts
Trace
A trace is the complete record of a request as it travels through a distributed system. Every trace has a globally unique TraceId - a 128-bit hex string.
Span
A span is one unit of work within a trace. Every span records:
SpanId(64-bit, unique within the trace)ParentSpanId(the span that created this one; null for the root span)Name(e.g.,"document-service: fetch_doc")StartTimeandEndTimeStatus(OK or ERROR, with optional description)Attributes(key-value pairs:http.method,db.statement, custom fields)Events(timestamped log-like messages within the span)Links(references to other traces - useful for async message passing)
The Timing Diagram
Time ───────────────────────────────────────────────────────────────►
Trace: abc123
│
├── Span: api-gateway/handle_request (root) ├──────────────────────────────────────────────────┤
│
│ ├── Span: document-service/handle_request │ ├───────────────────────────────────────┤
│ │ ├── Span: db/SELECT │ │ ├──────────────┤
│ │ └── Span: redis/GET │ │ ├─┤
│ │
│ └── Span: ml-service/run_inference │ ├──────────────┤
│ └── Span: model/predict │ ├──────────┤
The traceparent Header
The W3C Trace Context specification defines how trace context flows between services via HTTP headers:
traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01
^ ^ ^ ^
| TraceId (128-bit / 32 hex chars) | flags (01=sampled)
version SpanId (64-bit / 16 hex chars)
When service A calls service B, it injects traceparent into the HTTP request headers. Service B extracts it, creates a new child span with that TraceId and the incoming SpanId as ParentSpanId.
| Field | Size | Purpose |
|---|---|---|
version | 8-bit | Always 00 currently |
trace-id | 128-bit | Unique for the entire distributed trace |
parent-id | 64-bit | ID of the calling span (becomes ParentSpanId in child) |
trace-flags | 8-bit | 01 = sampled, 00 = not sampled |
2. OpenTelemetry Python Setup
OpenTelemetry (OTel) is the vendor-neutral standard for distributed tracing (and now also metrics and logs). It replaces older systems like OpenCensus and OpenTracing.
Architecture
Python Service
┌─────────────────────────────────────┐
│ TracerProvider │
│ ├── Sampler (decides what to trace)│
│ ├── SpanProcessor │
│ │ └── BatchSpanProcessor │
│ │ └── OTLPSpanExporter ─────────────► OpenTelemetry Collector
│ └── Resource (service metadata) │ │
└─────────────────────────────────────┘ │ OTLP
▼
Jaeger
(trace storage + UI)
Full Setup Module
# app/tracing.py
"""
OpenTelemetry tracing setup.
Call setup_tracing() once at application startup, before any
instrumentation libraries are initialised.
"""
import os
from typing import Optional
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import (
BatchSpanProcessor,
ConsoleSpanExporter,
)
from opentelemetry.sdk.trace.sampling import (
ALWAYS_ON,
TraceIdRatioBased,
ParentBased,
)
from opentelemetry.sdk.resources import (
Resource,
SERVICE_NAME,
SERVICE_VERSION,
DEPLOYMENT_ENVIRONMENT,
)
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor
from opentelemetry.instrumentation.httpx import HTTPXClientInstrumentor
from opentelemetry.instrumentation.sqlalchemy import SQLAlchemyInstrumentor
from opentelemetry.instrumentation.redis import RedisInstrumentor
from opentelemetry.instrumentation.logging import LoggingInstrumentor
def setup_tracing(
service_name: str,
service_version: str,
environment: str,
otlp_endpoint: str = "http://localhost:4317",
sample_rate: float = 1.0,
console_export: bool = False,
) -> TracerProvider:
"""
Initialise OpenTelemetry tracing for a FastAPI service.
Args:
service_name: Name of this service (e.g., "document-api")
service_version: Version string (e.g., "2.14.0")
environment: Deployment environment (e.g., "production")
otlp_endpoint: OTLP gRPC endpoint for the collector/Jaeger
sample_rate: Fraction of traces to sample (1.0 = all, 0.1 = 10%)
console_export: Also print spans to stdout (useful for debugging)
Returns:
The configured TracerProvider (also set as global)
"""
# Resource: metadata attached to every span from this service
resource = Resource.create({
SERVICE_NAME: service_name,
SERVICE_VERSION: service_version,
DEPLOYMENT_ENVIRONMENT: environment,
"host.name": os.uname().nodename,
"process.pid": os.getpid(),
})
# Sampler: ParentBased respects the sampling decision of the upstream service
# If the upstream sampled the trace, we continue sampling it.
# If upstream did not sample, we apply our own rate.
if sample_rate >= 1.0:
sampler = ALWAYS_ON
else:
sampler = ParentBased(root=TraceIdRatioBased(sample_rate))
# Provider: the central object that creates tracers
provider = TracerProvider(
resource=resource,
sampler=sampler,
)
# Exporter: sends spans to the backend
otlp_exporter = OTLPSpanExporter(endpoint=otlp_endpoint)
provider.add_span_processor(
BatchSpanProcessor(
otlp_exporter,
max_queue_size=2048,
max_export_batch_size=512,
schedule_delay_millis=5000, # Export every 5 seconds
export_timeout_millis=10000,
)
)
if console_export:
provider.add_span_processor(
BatchSpanProcessor(ConsoleSpanExporter())
)
# Set as the global provider - all calls to trace.get_tracer() use this
trace.set_tracer_provider(provider)
return provider
def instrument_app(app, engine=None, redis_client=None) -> None:
"""
Apply auto-instrumentation to the FastAPI app and its dependencies.
Call AFTER setup_tracing() and BEFORE the app starts handling requests.
"""
# FastAPI: instruments all routes, adds span for each request
FastAPIInstrumentor.instrument_app(
app,
server_request_hook=_server_request_hook,
client_request_hook=None,
client_response_hook=None,
)
# HTTPX: instruments all outbound HTTP calls made with httpx
HTTPXClientInstrumentor().instrument(
request_hook=_outbound_request_hook,
response_hook=_outbound_response_hook,
)
# SQLAlchemy: instruments all database queries
if engine is not None:
SQLAlchemyInstrumentor().instrument(
engine=engine,
enable_commenter=True, # Adds trace ID comment to SQL queries
)
# Redis: instruments all redis operations
if redis_client is not None:
RedisInstrumentor().instrument()
# Logging: injects trace_id and span_id into stdlib log records
# This enables log-to-trace correlation without manual processor code
LoggingInstrumentor().instrument(set_logging_format=True)
def _server_request_hook(span, scope):
"""Add custom attributes to every inbound request span."""
if span and span.is_recording():
# Add request metadata from ASGI scope
if "headers" in scope:
headers = dict(scope["headers"])
if b"x-user-id" in headers:
span.set_attribute("user.id", headers[b"x-user-id"].decode())
def _outbound_request_hook(span, request):
"""Add custom attributes to every outbound HTTP request span."""
if span and span.is_recording():
span.set_attribute("http.request.url", str(request.url))
FastAPI Integration
# app/main.py
from contextlib import asynccontextmanager
from fastapi import FastAPI
from sqlalchemy.ext.asyncio import create_async_engine
from app.tracing import setup_tracing, instrument_app
engine = create_async_engine("postgresql+asyncpg://...")
@asynccontextmanager
async def lifespan(app: FastAPI):
# 1. Set up tracing provider FIRST
setup_tracing(
service_name="document-api",
service_version="2.14.0",
environment="production",
otlp_endpoint="http://otel-collector:4317",
sample_rate=0.1, # Sample 10% of traces in production
)
# 2. Instrument the app and its dependencies
instrument_app(app, engine=engine)
yield
app = FastAPI(lifespan=lifespan)
docker-compose Setup
# docker-compose.yml
services:
otel-collector:
image: otel/opentelemetry-collector-contrib:0.96.0
ports:
- "4317:4317" # OTLP gRPC
- "4318:4318" # OTLP HTTP
volumes:
- ./config/otel-collector.yaml:/etc/otelcol-contrib/config.yaml
jaeger:
image: jaegertracing/all-in-one:1.55
ports:
- "16686:16686" # Jaeger UI
- "14250:14250" # Collector gRPC
environment:
- COLLECTOR_OTLP_ENABLED=true
# config/otel-collector.yaml
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
processors:
batch:
timeout: 5s
send_batch_size: 1000
memory_limiter:
check_interval: 1s
limit_mib: 512
exporters:
jaeger:
endpoint: jaeger:14250
tls:
insecure: true
service:
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [jaeger]
3. Auto-Instrumentation: What Spans Are Generated
When you apply FastAPIInstrumentor, SQLAlchemyInstrumentor, and HTTPXClientInstrumentor, these are the spans generated automatically for a typical request:
POST /api/documents (FastAPI root span)
│ http.method: POST
│ http.url: /api/documents
│ http.status_code: 200
│ net.peer.ip: 10.0.0.1
│
├── SELECT documents WHERE id=? (SQLAlchemy span)
│ db.system: postgresql
│ db.statement: SELECT documents.id, documents.content_type ...
│ db.name: myapp_prod
│
├── GET document_cache:doc_8f3a (Redis span)
│ db.system: redis
│ db.statement: GET
│ net.peer.name: redis
│ net.peer.port: 6379
│
└── POST https://api.openai.com/v1/embeddings (HTTPX span)
http.method: POST
http.url: https://api.openai.com/v1/embeddings
http.status_code: 200
http.response_content_length: 4096
This is already extremely useful for root cause analysis - and it requires zero application code changes beyond the setup call.
4. Custom Spans
Auto-instrumentation covers I/O. Your business logic - document parsing, validation, ML pipeline stages - is invisible without custom spans.
Creating Custom Spans
# app/services/document_processor.py
from opentelemetry import trace
from opentelemetry.trace import Status, StatusCode
import structlog
log = structlog.get_logger()
tracer = trace.get_tracer(__name__) # Module-level tracer
class DocumentProcessor:
"""
Processes documents through a multi-stage pipeline.
Each stage gets its own span for individual timing.
"""
async def process(self, doc_bytes: bytes, filename: str) -> Document:
# The auto-instrumented FastAPI span is already active.
# This span becomes a child of it automatically.
with tracer.start_as_current_span(
"document.process",
attributes={
"document.filename": filename,
"document.size_bytes": len(doc_bytes),
},
) as span:
try:
doc = await self._run_pipeline(doc_bytes, filename, span)
span.set_status(Status(StatusCode.OK))
return doc
except Exception as exc:
span.set_status(
Status(StatusCode.ERROR, description=str(exc))
)
span.record_exception(exc)
raise
async def _run_pipeline(
self,
doc_bytes: bytes,
filename: str,
parent_span,
) -> Document:
# Stage 1: Content Type Detection
with tracer.start_as_current_span("document.detect_content_type") as span:
content_type = await self._detect_content_type(doc_bytes)
span.set_attribute("document.content_type", content_type)
# Stage 2: Text Extraction (most expensive stage)
with tracer.start_as_current_span("document.extract_text") as span:
span.set_attribute("document.content_type", content_type)
text = await self._extract_text(doc_bytes, content_type)
span.set_attribute("document.char_count", len(text))
span.set_attribute("document.extraction_engine", "pdfplumber")
# Stage 3: Chunking
with tracer.start_as_current_span("document.chunk") as span:
chunks = await self._chunk_text(text)
span.set_attribute("document.chunk_count", len(chunks))
span.set_attribute("document.avg_chunk_size", len(text) // max(len(chunks), 1))
# Stage 4: Embedding (calls external API - auto-instrumented by HTTPX)
with tracer.start_as_current_span("document.embed") as span:
span.set_attribute("document.chunk_count", len(chunks))
embeddings = await self._embed_chunks(chunks)
span.set_attribute("embedding.dimensions", len(embeddings[0]) if embeddings else 0)
# Stage 5: Store
with tracer.start_as_current_span("document.store") as span:
doc = await self._store(filename, text, chunks, embeddings, content_type)
span.set_attribute("document.id", doc.id)
return doc
async def _extract_text(self, doc_bytes: bytes, content_type: str) -> str:
span = trace.get_current_span()
if content_type == "application/pdf":
# Add a span event for significant moments within a span
span.add_event(
"pdf.open",
attributes={"size_bytes": len(doc_bytes)},
)
text = await self._extract_pdf_text(doc_bytes)
span.add_event(
"pdf.extracted",
attributes={"char_count": len(text)},
)
elif content_type == "text/plain":
text = doc_bytes.decode("utf-8")
else:
raise ValueError(f"Unsupported content type: {content_type}")
return text
Span Attributes vs Span Events
| Feature | Span Attributes | Span Events |
|---|---|---|
| Purpose | Static properties of the operation | Time-stamped occurrences within the span |
| Example | http.method = "POST", db.name = "prod" | "pdf page 5 parsed", "cache miss" |
| Timestamp | Set at span creation or updated during span | Has its own timestamp within the span |
| Use for | Characterising the span for filtering | Recording moments within a long operation |
| Cardinality concern | Yes - high-cardinality attributes hurt backends | Less so - events are per-trace not per-series |
# Attributes: static properties
span.set_attribute("model.name", "gpt-4")
span.set_attribute("model.version", "turbo-2024")
span.set_attribute("request.tokens", 1024)
# Events: things that happened during the span
span.add_event("model.called", {"timestamp_iso": "2026-03-07T09:14:32Z"})
span.add_event("model.response.received", {"tokens_used": 837})
span.add_event("rate_limit.hit", {"retry_after_seconds": 5})
5. Context Propagation
Context propagation is how tracing works across services. Without it, each service would start a new, disconnected trace.
HTTP: Automatic with HTTPX
When HTTPXClientInstrumentor is active, every httpx.AsyncClient request automatically injects the traceparent (and tracestate) header:
import httpx
from opentelemetry.instrumentation.httpx import HTTPXClientInstrumentor
# Already instrumented by setup_tracing()
async with httpx.AsyncClient() as client:
# traceparent header is injected automatically
response = await client.post(
"http://ml-service/api/predict",
json={"text": "classify this"},
)
# The ml-service receives traceparent and continues the same trace
The outgoing request will have headers like:
POST /api/predict HTTP/1.1
Host: ml-service
traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01
tracestate:
Content-Type: application/json
Kafka: Manual Header Injection
Message queues do not have automatic instrumentation for all use cases. You need to manually inject and extract trace context.
# app/messaging/kafka_producer.py
from opentelemetry import trace, propagate
from opentelemetry.propagators.textmap import DefaultTextMapPropagator
import json
from kafka import KafkaProducer
propagator = DefaultTextMapPropagator()
tracer = trace.get_tracer(__name__)
producer = KafkaProducer(bootstrap_servers=["kafka:9092"])
def send_document_event(document_id: str, event_type: str) -> None:
"""
Send a Kafka message with trace context in headers
so the consumer can continue the trace.
"""
with tracer.start_as_current_span(
"kafka.produce",
kind=trace.SpanKind.PRODUCER,
attributes={
"messaging.system": "kafka",
"messaging.destination": "document-events",
"messaging.operation": "send",
},
) as span:
# Collect the current trace context into a carrier dict
carrier = {}
propagate.inject(carrier) # Fills carrier with traceparent, tracestate
# Convert to Kafka headers format: list of (key, bytes) tuples
headers = [
(key, value.encode("utf-8"))
for key, value in carrier.items()
]
payload = json.dumps({
"document_id": document_id,
"event_type": event_type,
}).encode("utf-8")
producer.send(
"document-events",
value=payload,
headers=headers,
)
span.set_attribute("messaging.message_id", document_id)
# app/messaging/kafka_consumer.py
from opentelemetry import trace, propagate
from kafka import KafkaConsumer
import json
tracer = trace.get_tracer(__name__)
consumer = KafkaConsumer("document-events", bootstrap_servers=["kafka:9092"])
def consume_messages():
for message in consumer:
# Extract trace context from Kafka headers
carrier = {
key.decode("utf-8"): value.decode("utf-8")
for key, value in message.headers
}
context = propagate.extract(carrier)
# Start a span as a child of the producer's span
with tracer.start_as_current_span(
"kafka.consume",
context=context,
kind=trace.SpanKind.CONSUMER,
attributes={
"messaging.system": "kafka",
"messaging.source": "document-events",
"messaging.operation": "receive",
},
) as span:
data = json.loads(message.value)
span.set_attribute("document.id", data["document_id"])
process_document_event(data)
The W3C Trace Context Spec
Three headers defined by the W3C Trace Context Level 2 specification:
| Header | Format | Example |
|---|---|---|
traceparent | {version}-{traceid}-{parentid}-{flags} | 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01 |
tracestate | Vendor-specific key-value pairs | dd=s:2;o:rum,congo=t61rcWkgMzE |
baggage | RFC 7941 key-value pairs | userId=alice,serverNode=iad-2,isProduction=false |
OpenTelemetry's DefaultTextMapPropagator supports both traceparent and baggage by default.
6. Baggage
Baggage is a key-value store that propagates with the trace context across service boundaries. Unlike span attributes (which are only visible in that span), baggage values are available to all services in the call chain.
Use cases:
user.tier = "enterprise"- downstream services apply different rate limitsfeature.flag = "new_chunker"- all services log which feature flag variant is activeab.variant = "B"- correlate all spans in a trace to an A/B test variant
# app/middleware/baggage_middleware.py
from opentelemetry import baggage, context
from opentelemetry.baggage.propagation import W3CBaggagePropagator
from starlette.middleware.base import BaseHTTPMiddleware
class BaggageMiddleware(BaseHTTPMiddleware):
"""
Inject user-tier and feature flags into baggage so
downstream services can access them without passing them
as explicit parameters.
"""
async def dispatch(self, request, call_next):
# Set baggage values - these propagate to all downstream services
ctx = baggage.set_baggage("user.tier", "enterprise")
ctx = baggage.set_baggage(
"feature.new_chunker",
"enabled",
context=ctx,
)
token = context.attach(ctx)
try:
return await call_next(request)
finally:
context.detach(token)
# In any downstream service, read baggage:
user_tier = baggage.get_baggage("user.tier")
if user_tier == "enterprise":
apply_enterprise_rate_limit()
Baggage Caution
Baggage propagates to all services, including third-party ones. Never put sensitive data (PII, credentials) in baggage. Keep baggage small - it is included in every HTTP request header.
7. Sampling Strategies
In production, you cannot trace every request. At 10,000 requests/second, full tracing generates millions of spans per minute - too expensive to store and too slow to export.
Sampler Types
from opentelemetry.sdk.trace.sampling import (
ALWAYS_ON, # Sample everything - dev/testing only
ALWAYS_OFF, # Sample nothing - disable tracing
TraceIdRatioBased, # Deterministically sample N% of traces by TraceId hash
ParentBased, # Defer to upstream's decision; apply own rate for new traces
)
# Development: sample everything
sampler = ALWAYS_ON
# Production: sample 10% of new traces; always continue parent's sampling decision
sampler = ParentBased(root=TraceIdRatioBased(0.10))
# High-traffic service: sample 1%
sampler = ParentBased(root=TraceIdRatioBased(0.01))
Rate-Limiting Sampler
TraceIdRatioBased samples a percentage, but in bursts you might still generate too many traces. A rate-limiting sampler caps traces per second:
# app/tracing/rate_limiting_sampler.py
import time
import threading
from opentelemetry.sdk.trace.sampling import Sampler, SamplingResult, Decision
from opentelemetry.trace import SpanKind
from opentelemetry.trace.span import TraceState
from opentelemetry.util.types import Attributes
class RateLimitingSampler(Sampler):
"""
Samples at most `max_traces_per_second` traces per second.
Uses a token bucket algorithm.
"""
def __init__(self, max_traces_per_second: float = 5.0):
self._max_traces_per_second = max_traces_per_second
self._tokens = max_traces_per_second
self._last_refill = time.monotonic()
self._lock = threading.Lock()
def _refill(self):
now = time.monotonic()
elapsed = now - self._last_refill
self._tokens = min(
self._max_traces_per_second,
self._tokens + elapsed * self._max_traces_per_second,
)
self._last_refill = now
def should_sample(
self,
parent_context,
trace_id,
name,
kind=None,
attributes=None,
links=None,
trace_state=None,
) -> SamplingResult:
with self._lock:
self._refill()
if self._tokens >= 1.0:
self._tokens -= 1.0
return SamplingResult(
decision=Decision.RECORD_AND_SAMPLE,
attributes=attributes,
trace_state=trace_state or TraceState(),
)
return SamplingResult(
decision=Decision.DROP,
attributes=None,
trace_state=trace_state or TraceState(),
)
def get_description(self) -> str:
return f"RateLimitingSampler({self._max_traces_per_second}/s)"
Sampling Strategy Decision Table
| Scenario | Recommended Sampler | Sample Rate |
|---|---|---|
| Development / local | ALWAYS_ON | 100% |
| Staging | ParentBased(TraceIdRatioBased) | 100% |
| Production, low traffic (<100 req/s) | ParentBased(TraceIdRatioBased) | 100% |
| Production, medium traffic (100–1000 req/s) | ParentBased(TraceIdRatioBased) | 10% |
| Production, high traffic (>1000 req/s) | ParentBased(TraceIdRatioBased) | 1% + RateLimitingSampler |
| Always trace errors | Tail-based sampling (Jaeger Adaptive) | 100% errors, 1% successes |
Tail-Based Sampling Concept
Head-based sampling (what we have described so far) makes the sampling decision at the beginning of a trace, before you know if it will be slow or errored. This means you might drop 99% of requests and accidentally drop the one slow request you needed.
Tail-based sampling makes the decision after the trace completes. It keeps all error traces and all traces above a latency threshold, and samples the rest. This requires a dedicated component (e.g., OpenTelemetry Collector with the tail_sampling processor):
# config/otel-collector.yaml (tail sampling example)
processors:
tail_sampling:
decision_wait: 10s
num_traces: 100000
expected_new_traces_per_sec: 1000
policies:
- name: errors
type: status_code
status_code: {status_codes: [ERROR]}
- name: slow_traces
type: latency
latency: {threshold_ms: 1000}
- name: probabilistic
type: probabilistic
probabilistic: {sampling_percentage: 1}
8. Reading Jaeger
Open http://localhost:16686 to access the Jaeger UI.
Finding a Trace
Search:
Service: document-api
Operation: POST /api/documents
Min Duration: 1s ← filter for slow traces
Lookback: Last 1 hour
Reading the Waterfall
A waterfall diagram shows spans as horizontal bars. The key things to look for:
TRACE: abc123 Total: 1,891ms
────────────────────────────────────────────────────────────
POST /api/documents ████████████████████████████████ 1,891ms
document.process ████████████████████████████████ 1,878ms
document.detect_content_type █ 23ms
document.extract_text ████████████ 634ms
[GAP - between extract and chunk] ████ 201ms ← SUSPICIOUS
document.chunk ███ 87ms
document.embed ████████████████ 843ms
POST https://api.openai.com/... ████████████████ 841ms
document.store ██ 89ms
SELECT documents ... █ 12ms
INSERT documents ... █ 43ms
The 201ms gap between extract_text and chunk is visible in the waterfall even though it does not appear in any span - it is time spent in Python code between the two with tracer.start_as_current_span() blocks. This is a key advantage of tracing over logging: the gaps are visible.
The embed span at 843ms is almost entirely consumed by the OpenAI API call (841ms). The fix: batch requests or add caching.
Comparing Two Traces
Jaeger allows selecting two traces and diffing them. This is invaluable for questions like "what's different between a fast and slow request with the same route?"
9. Connecting Traces to Logs
The final step in the observability triad: linking a log line to the trace that caused it.
structlog Processor for Trace IDs
# app/logging/otel_processor.py
from typing import Any
from structlog.types import EventDict
def inject_trace_context(
logger: Any, method: str, event_dict: EventDict
) -> EventDict:
"""
Inject the current OpenTelemetry trace ID and span ID into the log record.
When a log line is viewed in Loki/Kibana, you can click the trace_id
to jump directly to the corresponding trace in Jaeger.
"""
try:
from opentelemetry import trace
span = trace.get_current_span()
if span.is_recording():
ctx = span.get_span_context()
# Format as 32-char hex for trace ID, 16-char for span ID
event_dict["trace_id"] = format(ctx.trace_id, "032x")
event_dict["span_id"] = format(ctx.span_id, "016x")
# W3C traceparent format for easy correlation
event_dict["traceparent"] = (
f"00-{format(ctx.trace_id, '032x')}"
f"-{format(ctx.span_id, '016x')}"
f"-{'01' if ctx.trace_flags.sampled else '00'}"
)
except ImportError:
pass
return event_dict
Add this processor to your structlog configuration (in logging_config.py):
# In setup_logging(), add before the renderer:
structlog.configure(
processors=[
structlog.contextvars.merge_contextvars,
structlog.stdlib.add_log_level,
structlog.stdlib.add_logger_name,
structlog.processors.TimeStamper(fmt="iso", utc=True),
inject_trace_context, # ← Add this
structlog.processors.format_exc_info,
mask_sensitive_data,
structlog.processors.JSONRenderer(),
],
...
)
Log Lines with Trace IDs
Every log line now carries trace_id and span_id:
{
"timestamp": "2026-03-07T09:14:33.891Z",
"level": "error",
"event": "document.extract_text.failed",
"filename": "report.pdf",
"error": "PDFSyntaxError: EOF marker not found",
"trace_id": "4bf92f3577b34da6a3ce929d0e0e4736",
"span_id": "00f067aa0ba902b7",
"traceparent": "00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01",
"request_id": "req_7e9d3b",
"service": "document-api"
}
Grafana: From Log to Trace in One Click
In Grafana, when you configure Loki as a data source and Jaeger as a trace data source, you can link them:
// In the Loki datasource configuration (Grafana provisioning)
{
"name": "Loki",
"type": "loki",
"url": "http://loki:3100",
"jsonData": {
"derivedFields": [
{
"matcherRegex": "\"trace_id\":\"(\\w+)\"",
"name": "TraceID",
"url": "${__value.raw}",
"datasourceUid": "jaeger-uid",
"urlDisplayLabel": "View Trace in Jaeger"
}
]
}
}
Now when you view a log line in Grafana Loki that has a trace_id, a "View Trace in Jaeger" button appears inline. One click takes you from the log line to the full distributed trace in Jaeger.
Complete OpenTelemetry Integration Test
# tests/test_tracing.py
import pytest
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export.in_memory_span_exporter import InMemorySpanExporter
from opentelemetry.sdk.trace.export import SimpleSpanProcessor
from opentelemetry import trace
@pytest.fixture
def trace_exporter():
"""
In-memory span exporter for testing.
Captures all spans created during the test.
"""
exporter = InMemorySpanExporter()
provider = TracerProvider()
provider.add_span_processor(SimpleSpanProcessor(exporter))
trace.set_tracer_provider(provider)
yield exporter
exporter.clear()
def test_document_processor_creates_spans(trace_exporter):
"""Verify the document processor creates the expected span hierarchy."""
from app.services.document_processor import DocumentProcessor
import asyncio
processor = DocumentProcessor()
asyncio.run(processor.process(b"test content", "test.txt"))
spans = trace_exporter.get_finished_spans()
span_names = [s.name for s in spans]
assert "document.process" in span_names
assert "document.detect_content_type" in span_names
assert "document.extract_text" in span_names
assert "document.chunk" in span_names
assert "document.store" in span_names
# Verify parent-child relationships
root_span = next(s for s in spans if s.name == "document.process")
child_spans = [
s for s in spans
if s.parent and s.parent.span_id == root_span.context.span_id
]
assert len(child_spans) >= 4
def test_error_span_has_error_status(trace_exporter):
"""Verify that exceptions set span status to ERROR."""
from app.services.document_processor import DocumentProcessor
import asyncio
processor = DocumentProcessor()
with pytest.raises(ValueError):
asyncio.run(processor.process(b"", "corrupted.pdf"))
spans = trace_exporter.get_finished_spans()
root_span = next(s for s in spans if s.name == "document.process")
from opentelemetry.trace import StatusCode
assert root_span.status.status_code == StatusCode.ERROR
Interview Questions and Answers
Q1: A distributed trace shows a total duration of 2 seconds, but the sum of all individual spans is only 1.2 seconds. Is this a bug in the tracing instrumentation?
No - this is expected and represents uninstrumented time. The "missing" 800ms is time the application spent in code paths that do not have spans: Python interpreter overhead, garbage collection pauses, context switches, time between with tracer.start_as_current_span() blocks, and library code that is not instrumented. This gap is actually one of the most valuable signals tracing provides - it tells you where you should add more instrumentation. Look for the largest gaps between consecutive sibling spans and add custom spans there.
Q2: You set sample_rate=0.01 (1% sampling). A critical bug causes errors on 5 requests per second. How many error traces do you capture?
With TraceIdRatioBased(0.01), only 1% of traces are kept - including error traces. If errors occur at 5/sec, you capture approximately 0.05 error traces per second, or about 3 per minute. This is the key weakness of head-based sampling. The solution is tail-based sampling (using the OTel Collector's tail_sampling processor), which makes the sampling decision after the trace completes and can apply a policy like "always keep error traces, sample 1% of success traces." Alternatively, many teams combine both: ParentBased(TraceIdRatioBased(0.01)) for normal traffic, plus a separate error rate alert in Prometheus that fires without tracing data.
Q3: Two developers disagree about whether to put user_id in span attributes or in baggage. Who is right?
Both approaches have valid use cases. Span attributes attach the value to a specific span - it is visible in the Jaeger trace for that service only. Baggage propagates the value to all downstream services in the trace, without each service needing to explicitly pass it. If user_id should be visible in every span across all services (e.g., for security auditing or per-user SLOs), put it in baggage. If user_id is only relevant to the service that has it (e.g., the authentication service), put it only in span attributes. The caution with baggage: it adds to every HTTP request header, and it is visible to all downstream services including third-party ones, so sensitive data should never go in baggage.
Q4: How does ParentBased(root=TraceIdRatioBased(0.10)) behave differently from TraceIdRatioBased(0.10) alone?
TraceIdRatioBased(0.10) makes its own sampling decision independently for every span, ignoring the upstream service's decision. If the upstream service sampled the trace (flagged in traceparent), but TraceIdRatioBased decides not to sample this service's span, the trace will be broken - you lose the downstream portion. ParentBased wraps the sampler: if there is a sampled parent context (upstream said "sample this"), ParentBased always continues sampling. If there is an unsampled parent context, ParentBased always drops. Only for root spans (no parent) does it delegate to the wrapped sampler (TraceIdRatioBased(0.10)). This ensures trace continuity: once a trace is sampled at the entry point, it stays sampled through all downstream services.
Q5: You have a Python service that processes Kafka messages. Each message processing starts a new trace. After six months, you realise you cannot correlate message processing traces with the API request traces that produced the messages. How do you fix this going forward?
The fix is trace context injection at produce time and extraction at consume time, as shown in this lesson. The producer injects the current span's context into Kafka message headers (traceparent, tracestate). The consumer extracts that context and starts its span with context=extracted_context. The consumer span then appears as a child of the producer span in the trace, even though they ran asynchronously at different times. In Jaeger, the trace shows the full causal chain: HTTP request → Kafka produce → (async gap) → Kafka consume → processing pipeline. For existing messages that were produced without trace headers, you can only add this going forward - you cannot retroactively link them.
